Search CORE

5 research outputs found

Learning Deep Features for Scene Recognition using Places Database

Author: Lapedriza Garcia Agata
Oliva Aude
Torralba Antonio
Xiao Jianxiong
Zhou Bolei
Publication venue: Neural Information Processing Systems Foundation
Publication date: 01/01/2014
Field of study

Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success. This may be because current deep features trained from ImageNet are not competitive enough for such tasks. Here, we introduce a new scene-centric database called Places with over 7 million labeled pictures of scenes. We propose new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Using CNN, we learn deep features for scene recognition tasks, and establish new state-of-the-art results on several scene-centric datasets. A visualization of the CNN layers' responses allows us to show differences in the internal representations of object-centric and scene-centric networks.National Science Foundation (U.S.) (Grant 1016862)United States. Office of Naval Research. Multidisciplinary University Research Initiative (N000141010933)Google (Firm)Xerox CorporationGrant TIN2012-38187-C03-02United States. Intelligence Advanced Research Projects Activity (United States. Air Force Research Laboratory Contract FA8650-12-C-7211

CiteSeerX

Princeton University Open Access Repository

DSpace@MIT

Places: A 10 Million Image Database for Scene Recognition

Author: Khosla Aditya
Lapedriza Garcia Agata
Oliva Aude
Torralba Antonio
Zhou Bolei
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2018
Field of study

The rise of multi-million-item dataset initiatives has enabled data-hungry machine learning algorithms to reach near-human semantic classification performance at tasks such as visual object and scene recognition. Here we describe the Places Database, a repository of 10 million scene photographs, labeled with scene semantic categories, comprising a large and diverse list of the types of environments encountered in the world. Using the state-of-the-art Convolutional Neural Networks (CNNs), we provide scene classification CNNs (Places-CNNs) as baselines, that significantly outperform the previous approaches. Visualization of the CNNs trained on Places shows that object detectors emerge as an intermediate representation of scene classification. With its high-coverage and high-diversity of exemplars, the Places Database along with the Places-CNNs offer a novel resource to guide future progress on scene recognition problems. Keywords: Scene classification; visual recognition; deep learning; deep feature; image datasetNational Science Foundation (U.S.) (Grant 1016862)National Science Foundation (U.S.) (Grant 1524817)United States. Assistant Secretary of Defense for Research and Engineering. Basic Research Office (United States. Office of Naval Research (Grant N00014-16-1-3116

DSpace@MIT

Crossref

Business Links

Author: Khosla Aditya
Lapedriza Garcia Agata
Oliva Aude
Torralba Antonio
Zhou Bolei
Publication venue
Publication date: 01/01/1995
Field of study

SIGLEAvailable from British Library Document Supply Centre- DSC:5188.35(61) / BLDSC - British Library Document Supply CentreGBUnited Kingdo

arXiv.org e-Print Archive

DSpace@MIT

Crossref

OpenGrey Repository

Understanding the role of individual units in a deep neural network

Author: Bau David
Lapedriza Garcia Agata
Strobelt Hendrik
Torralba Antonio
Zhou Bolei
Zhu Jun-Yan
Publication venue: 'Proceedings of the National Academy of Sciences'
Publication date: 12/09/2020
Field of study

Deep neural networks excel at finding hierarchical representations that solve complex tasks over large datasets. How can we humans understand these learned representations? In this work, we present network dissection, an analytic framework to systematically identify the semantics of individual hidden units within image classification and image generation networks. First, we analyze a convolutional neural network (CNN) trained on scene classification and discover units that match a diverse set of object concepts. We find evidence that the network has learned many object classes that play crucial roles in classifying scene classes. Second, we use a similar analytic method to analyze a generative adversarial network (GAN) model trained to generate scenes. By analyzing changes made when small sets of units are activated or deactivated, we find that objects can be added and removed from the output scenes while adapting to the context. Finally, we apply our analytic framework to understanding adversarial attacks and to semantic image editing.Defense Advanced Research Projects Agency (Award FA8750-18-C-0004)NSF (Grants 1524817 and BIGDATA-1447476

arXiv.org e-Print Archive

DSpace@MIT

Explainable, automated urban interventions to improve pedestrian and vehicle safety

Author: Arenas Moreno Àlex
Borge Holthoefer Javier
Bustos Rodriguez Maria Cristina
Everett Rhoads Daniel
Lapedriza Garcia Agata
Masip Rodo David
Solé Ribalta Albert
Publication venue: 'Elsevier BV'
Publication date: 23/10/2020
Field of study

At the moment, urban mobility research and governmental initiatives are mostly focused on motor-related issues, e.g. the problems of congestion and pollution. And yet, we cannot disregard the most vulnerable elements in the urban landscape: pedestrians, exposed to higher risks than other road users. Indeed, safe, accessible, and sustainable transport systems in cities are a core target of the UN's 2030 Agenda. Thus, there is an opportunity to apply advanced computational tools to the problem of traffic safety, in regards especially to pedestrians, who have been often overlooked in the past. This paper combines public data sources, large-scale street imagery and computer vision techniques to approach pedestrian and vehicle safety with an automated, relatively simple, and universally-applicable data-processing scheme. The steps involved in this pipeline include the adaptation and training of a Residual Convolutional Neural Network to determine a hazard index for each given urban scene, as well as an interpretability analysis based on image segmentation and class activation mapping on those same images. Combined, the outcome of this computational approach is a fine-grained map of hazard levels across a city, and an heuristic to identify interventions that might simultaneously improve pedestrian and vehicle safety. The proposed framework should be taken as a complement to the work of urban planners and public authorities

The Oberta in open access